Lecture 24 - Docker for Data Science
conda, pipenv, virtualenv
bash shell, a git client, a Python interpreter, a Jupyter notebook server, a Quarto document processor, and a SQL databasedocker run commandbash as the default shell)
FROM image:tagFROM, followed by the base image we want to use: character is used to specify the version of the image. In this case, we are using Ubuntu 24.04RUN commandRUN instruction executes any commands in a new layer on top of the current image and commits the resultsapt, which we can use to install software packages. The only commands we will need to use are apt-get update (only once) and apt-get install <package>
git, we would run apt-get update && apt-get install -y gitapt-get clean and rm -rf /var/lib/apt/lists/* after installing the packages
# Update and install dependencies
# Versions: https://packages.ubuntu.com/
RUN apt-get update && apt-get install -y \
bash=5.2.21-2ubuntu4 \
git=1:2.43.0-1ubuntu7.1 \
sqlite3=3.45.1-1ubuntu2 \
wget=1.21.4-1ubuntu4.1 \
python3=3.12.3-0ubuntu2 \
python3.12-venv \
python3-pip=24.0+dfsg-1ubuntu1.1 && \
apt-get clean && rm -rf /var/lib/apt/lists/*RUN instructionapt list --installed to see which packages are installed on my system and just copy them to the Dockerfilepip installed, so you would only need to install the other packagespip3, such as numpy, pandas, jupyterlab, dask, and matplotlibRUN instructions againpip show <package> | grep Version or pip freeze > requirements.txt and then copy the versions from the filewgetwget to download the binarypip or apt, so we need to download it from the official website: https://quarto.org/docs/get-started/wget is a command-line utility that allows you to download files from the web.deb file (which is the package format for Ubuntu), we can install it with apt-get install <package> (like we did with the other packages)wget, as long as we have the URL8888, so we will need to expose this port with the EXPOSE instructionbash inside the JupyterLab interface and have access to all the tools we installed in the container (like git, sqlite3, and Quarto) 😉# Base image
FROM ubuntu:24.04
# Update and install system dependencies
RUN apt-get update && apt-get install -y \
bash=5.2.21-2ubuntu4 \
git=1:2.43.0-1ubuntu7.1 \
curl=8.5.0-2ubuntu10.5 \
wget=1.21.4-1ubuntu4.1 \
python3=3.12.3-0ubuntu2 \
python3.12-venv \
python3-pip=24.0+dfsg-1ubuntu1.1 && \
apt-get clean && rm -rf /var/lib/apt/lists/*
# Create and activate virtual environment
RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Install Python dependencies in virtual environment
RUN pip install numpy==1.26.4 pandas==2.2.2 \
jupyterlab==4.2.5 ipykernel==6.29.5 \
dask==2024.11.2 matplotlib==3.9.2
# Install Quarto
RUN wget https://github.com/quarto-dev/quarto-cli/releases/download/v1.6.37/quarto-1.6.37-linux-arm64.deb && \
apt-get install -y ./quarto-1.6.37-linux-arm64.deb && \
rm quarto-1.6.37-linux-arm64.deb
# Create a directory for saving files
RUN mkdir -p /workspace
WORKDIR /workspace
# Expose port for JupyterLab
EXPOSE 8888
# Start JupyterLab
CMD ["sh", "-c", ". /opt/venv/bin/activate && jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root"]docker build command-t flag is used to tag the image with a name, in this case qtm350-container. at the end of the command specifies the build context, which is the current directorydocker run commanddocker run command-p flag to map the port 8888 of the container to the port 8888 of the host machine-v flag to mount a volume in the container, so we can persist the notebooks outside the container-v flag is used to mount the current directory ($(pwd)) to the /workspace directory in the containerLABEL instruction to add metadata to the container, such as the author, the version, and the descriptionMAINTAINER instruction to specify the maintainer of the containerdocker inspect command# Metadata
LABEL version="1.0" \
description="Container with all tools covered in QTM 350" \
maintainer="Danilo Freire <danilo.freire@emory.edu>" \
license="MIT"COPY instruction to copy files from the host machine to the containerCtrl+C in the terminal where the container is runningdocker ps command to see the list of running containers and then run the docker stop command with the container IDdocker rm command and the image with the docker rmi commanddocker tag and docker push commandsFROM instruction to specify the base image, then used the RUN instruction to install the system packages and the Python librariesENV instruction to set the PATH environment variable, the EXPOSE instruction to expose the port for the Jupyter notebook server, and the CMD instruction to start the Jupyter notebook serverLABEL instructionsdocker build command and ran it with the docker run command